Learning Models for Aligning Protein Sequences with Predicted Secondary Structure
نویسندگان
چکیده
Accurately aligning distant protein sequences is notoriously difficult. A recent approach to improving alignment accuracy is to use additional information such as predicted secondary structure. We introduce several new models for scoring alignments of protein sequences with predicted secondary structure, which use the predictions and their confidences to modify both the substitution and gap cost functions. We present efficient algorithms for computing optimal pairwise alignments under these models, all of which run in near-quadratic time. We also review an approach to learning the values of the parameters in these models called inverse alignment. We then evaluate the accuracy of these models by studying how well an optimal alignment under the model recovers known benchmark reference alignments. Our experiments show that using parameters learned by inverse alignment, these new secondarystructure-based models provide a significant improvement in alignment accuracy for distant sequences. The best model improves upon the accuracy of the standard sequence alignment model for pairwise alignment by as much as 15% for sequences with less than 25% identity, and improves the accuracy of multiple alignment by 20% for difficult benchmarks whose average accuracy under standard tools is less than 40%.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملAligning Protein Sequences with Predicted Secondary Structure
Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequence...
متن کاملIsolation and characterization of Phi class glutathione transferase partial gene from Iranian barley
Glutathione transferases are multifunctional proteins involved in several diverse intracellular events such as primary and secondary metabolisms, signaling and stress metabolism. These enzymes have been subdivided into eight classes in plants. The Phi class, being plant specific, is the most represented. In the present study, based on the sequences available at GenBank, different primers were d...
متن کاملFold recognition using predicted secondary structure sequences and hidden Markov models of protein folds.
We present an analysis of the blind predictions submitted to the fold recognition category for the second meeting on the Critical Assessment of techniques for protein Structure Prediction. Our method achieves fold recognition from predicted secondary structure sequences using hidden Markov models (HMMs) of protein folds. HMMs are trained only with experimentally derived secondary structure sequ...
متن کاملStochastic context-free grammars for tRNA modeling.
Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of tRNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. Results show that after having been trained on as few as 20 tRNA sequences from only two tRNA subfamilies (mit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009